71 research outputs found

    Realistic multi-microphone data simulation for distant speech recognition

    Full text link
    The availability of realistic simulated corpora is of key importance for the future progress of distant speech recognition technology. The reliability, flexibility and low computational cost of a data simulation process may ultimately allow researchers to train, tune and test different techniques in a variety of acoustic scenarios, avoiding the laborious effort of directly recording real data from the targeted environment. In the last decade, several simulated corpora have been released to the research community, including the data-sets distributed in the context of projects and international challenges, such as CHiME and REVERB. These efforts were extremely useful to derive baselines and common evaluation frameworks for comparison purposes. At the same time, in many cases they highlighted the need of a better coherence between real and simulated conditions. In this paper, we examine this issue and we describe our approach to the generation of realistic corpora in a domestic context. Experimental validation, conducted in a multi-microphone scenario, shows that a comparable performance trend can be observed with both real and simulated data across different recognition frameworks, acoustic models, as well as multi-microphone processing techniques.Comment: Proc. of Interspeech 201

    Multiple Source Localization Based on Acoustic Map De-Emphasis

    Get PDF
    This paper describes a novel approach for localization of multiple sources overlapping in time. The proposed algorithm relies on acoustic maps computed in multi-microphone settings, which are descriptions of the distribution of the acoustic activity in a monitored area. Through a proper processing of the acoustic maps, the positions of two or more simultaneously active acoustic sources can be estimated in a robust way. Experimental results obtained on real data collected for this specific task show the capabilities of the given method both with distributed microphone networks and with compact arrays

    Efficient time delay estimation based on cross-power spectrum phase

    No full text
    Accurate Time Delay Estimation for acoustic signals acquired in noisy and reverberant environments is an important task in many speech processing applications. The Cross-power Spectrum Phase analysis is a popular method that has been demonstrated to perform well even in moderately adverse conditions. This paper describes an efficientapproach to apply it in the case of static sources. It exploits the linearity of the generalized cross-correlation to accumulate information from a plurality of framesin the time domain. This translates into a reduced computational load and a more robust estimation.Several examples drawn from real and simulated data in typical applications are discussed

    Talker Tracking and Speech Acquisition using two Microphone Pairs and a CrosspowerSpectrum Phase Analysis

    No full text
    Detection, localization and enhancement of a generic acoustic message produced in a noisy environment can be accomplished by means of a microphonbe array. A Crosspower Spectrum Phase analysis and the corresponding Coherence Measure allow an accurate time delay estimation employed in source localization. Once source position is estimated, an enhanced versiion of the original acoustic message is derived, that can represent the input for a speech recognition system. Preliminary results in terms of talker localization are presente

    Localizzazione di sorgenti acustiche in ambiente rumoroso e riverberante

    No full text
    Una schiera lineare di quattro microfoni consente di localizzare eventi acustici prodotti in un ambiente reale, facendo uso di un accurato stimatore di ritardo. L'analisi dell'informazione di fase dello spettro di potenza incrociato di coppie di segnali porta ad una elevata accuratezza di localizzazione. Il comportamento di questa tecnica è stato studiato in varie condizioni di rumore e di riverbero ambiental

    Acoustic Source Location In Noisy And Reverberant Environment Using Csp Analysis

    No full text
    A linear four microphone array can be employed for acoustic event location in a real environment using an accurate time delay estimation. This paper refers to the use of a specific technique, based on Crosspower Spectrum Phase (CSP) analysis, that yielded accurate location performance. The behavior of this technique is investigated under different noise and reverberation conditions. Real experiments as well as simulations were conducted to analyze a wide variety of situations. Results show system robustness at quite critical environmental conditions. 1. INTRODUCTION The development of microphone array technology [1] is a fundamental step for the advancement of hands-free speech recognition [2], teleconferencing and acoustic surveillance systems [3]. These applications require capabilities such as automatic talker location [4] and spatially selective pickup of sound [5], which can be performed by the processing of acoustic signals supplied by a multichannel acquisition setup. Acoustic ..

    Acoustic Event Localization Using a Crosspower-Spectrum Phase Based Technique

    No full text
    Linear microphone arrays can be employed for acoustic event localization in a noisy environment using time delay estimation. Three techniques are investigated that allow delay estimation, namely Normalized Cross Correlation, LMS Adaptive Filters, Crosspower-Spectrum Phase: they are combined with a bidimensional representation, the Coherence Measure, in order to emphasize information that can be exploited for estimating position of both non-moving and moving acoustic sources. To compare the given techniques, different acoustic sources were considered, that generated events in different positions in space. Expressing performance in terms of accuracy of the wavefront direction angle, experiments showed that the Crosspower-Spectrum Phase based technique outperforms the other two. This technique provided very promising preliminary results also in terms of source position estimatio
    • …
    corecore